Data processing system with improved latency and associated methods

ABSTRACT

An configurable integrated-circuit device includes a plurality of regions that each contain electronic circuitry. The configurable integrated-circuit device also includes common circuitry adapted to provide at least one signal to at least two regions of the plurality of regions. The common circuitry and the at least two regions are positioned within the configurable integrated-circuit device so as to improve the latencies of the at least one signal to each of the at least two regions.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This patent application claims priority to Provisional U.S.Patent Application Serial No. 60/238,986, Attorney Docket No.ALTR:002PZ1, filed on Oct. 10, 2000.

[0002] Furthermore, this patent application relates to concurrentlyfiled U.S. patent application Ser. No. ______, Attorney Docket No.ALTR:003, Client Reference No. A640, titled “Apparatus and Methods forFast Conditional-Instruction Controller Circuitry in Data-ProcessingSystems.”

TECHNICAL FIELD OF THE INVENTION

[0003] This invention relates to electronic data-processing systems and,more particularly, to improving latency in a real-time operatingenvironment.

BACKGROUND

[0004] Some computer and data processing applications demand real-timeresponse from the computer system. For example, telecommunication anddata-communication applications often involve real-time processing ofnetwork data as a result of multimedia, audio, and video applicationsand user-data. The real-time demand often places stringent requirementson the hardware, software, or both.

[0005] To address the demands of real-time applications, designersemploy sophisticated system designs with correspondingly complexhardware and software components. Complex hardware designs typicallyinclude high levels of integration and reside on a large silicon die,for example, within programmable gate-arrays (PGAs), field-programmablegate arrays (FPGAs), programmable logic-devices (PLDs), or complexprogrammable logic-devices (CPLDs).

[0006] Providing a high-performance system with good real-time responseentails designing sophisticated software (both embedded software and thedevelopment tools accompanying the data-processing system) andhigh-speed hardware circuitry. Designing a system with still higherperformance requires optimized layout of the circuitry on the silicondie. Thus, an optimal system design includes an appropriate mix of thethree performance factors, i.e., hardware performance, softwareperformance, and silicon layout. Focusing on any one of these factorsmay create a partly optimized system, but not an optimal overallsolution. Worse yet, failing to provide an optimal silicon layout maynegate the advantages of optimized hardware, software, or both.Unfortunately, no comprehensive design approach exists that provides anoptimal mix of software performance, hardware performance, andsilicon-layout performance.

[0007] Another aspect of high-performance data-processing systemsrelates to the execution of instructions within the processor circuitry.Software instructions that operate on data present within the processor(e.g., within the processor's internal registers) often execute at highspeeds, i.e., with low latency. On the other hand, software instructionsthat operate on data outside the processor, for example, conditionalbranch instructions, typically suffer from increased latency. Thelatency results from the time interval it takes the processor to obtainthe data and to operate on them to execute the instructions. Typically,a source outside the processor interrupts the processor. In response,the processor uses an interrupt service routine to obtain the data fromthe outside source, and executes the instruction or instructions thatoperate on the data. The period from the initial interrupt to theexecution of the instruction within the processor often takes many clockcycles. The interrupt-driven scheme results in reduced systemperformance because of the increased latency. Thus, a need exists forimproved latency in the data-processing system when it executesinstructions that operate on data residing outside the processorcircuitry.

SUMMARY OF THE INVENTION

[0008] One aspect of this invention contemplates apparatus for improvinglatency in data-processing systems. In one embodiment, a configurableintegrated-circuit device according to the invention includes aplurality of regions and a common circuitry. Each of the plurality ofregions of the integrated-circuit device includes configurableelectronic circuitry. The common circuitry provides at least one signalto at least two regions of the plurality of regions. The commoncircuitry and the at least two regions are positioned within theconfigurable integrated-circuit device so as to improve the latencies ofthe at least one signal to each of the at least two regions. Adata-processing system according to the invention includes theconfigurable integrated-circuit device and at least one peripheralcircuitry coupled to the configurable integrated-circuit device.

[0009] In another embodiment, a configurable integrated-circuit deviceaccording to the invention includes a plurality of regions that eachinclude electronic circuitry, and a common circuitry. The commoncircuitry provides at least one signal to at least two regions of theplurality of regions. The common circuitry and the at least two regionsare positioned within the configurable integrated-circuit device so thatthe latencies of the at least one signal to each of the at least tworegions tend to be equalized. A data-processing system according to theinvention includes the configurable integrated-circuit device and atleast one peripheral circuitry coupled to the configurableintegrated-circuit device.

[0010] In a third embodiment, a programmable logic device (PLD)according to the invention includes a plurality of regions that eachinclude configurable electronic circuitry, and bus circuitry thatcouples to the plurality of regions. The PLD also includes commoncircuitry that couples to the bus circuitry. The common circuitryprovides a signal to the plurality of regions through the bus circuitry.The common circuitry and the plurality of regions are positioned withinthe PLD so as to improve the latencies of the signal to each region. Adata-processing system according to the invention includes the PLD andat least one peripheral circuitry coupled to the PLD.

[0011] Another aspect of the invention contemplates methods forimproving latency in data-processing systems. In a first embodiment, amethod according to the invention for improving latency in aconfigurable integrated-circuit device includes providing theconfigurable integrated-circuit device, and partitioning theconfigurable integrated-circuit device into a plurality of regions thateach include configurable electronic circuitry. The method also includeswithin the integrated-circuit device a common circuitry that provides atleast one signal to at least two regions of the plurality of regions.The method positions the common circuitry and the at least two regionswithin the configurable integrated-circuit device so as to improve thelatencies of the at least one signal to each of the at least tworegions.

[0012] In a second embodiment, a method according to the invention forimproving latency in a configurable integrated-circuit device includesproviding the configurable integrated-circuit device, and partitioningthe configurable integrated-circuit device into a plurality of regionsthat each include configurable electronic circuitry. The method alsoincludes within the configurable integrated-circuit device a commoncircuitry that provides at least one signal to at least two regions ofthe plurality of regions. The method positions the common circuitry andthe at least two regions within the configurable integrated-circuitdevice so that the latencies of the at least one signal to each of theat least two regions tend to be equalized.

DESCRIPTION OF THE DRAWINGS

[0013] The appended drawings illustrate only exemplary embodiments ofthe invention and therefore do not limit its scope, because theinventive concepts lend themselves to other equally effectiveembodiments. Like reference numerals in the drawings identify the same,similar, or equivalent components, blocks, circuitry, structure, orfunctionality.

[0014]FIG. 1 illustrates a data-processing system that includes anintegrated-circuit device according to the invention.

[0015]FIG. 2 shows another data-processing system that includes anintegrated-circuit device according to the invention.

[0016]FIG. 3 illustrates a data-processing system that constitutes avariation of the system of FIG. 1.

[0017]FIG. 4 shows a data-processing system that constitutes a variationof the system of FIG. 2.

[0018]FIG. 5A illustrates a block diagram of an embodiment of adata-processing system according to the invention.

[0019]FIG. 5B depicts a block diagram of a page register circuitryaccording to the invention.

[0020]FIG. 6 shows a block diagram of another embodiment of adata-processing system according to the invention.

[0021]FIG. 7 depicts a more detailed block diagram of an embodiment of adata-processing system according to the invention.

[0022]FIG. 8 illustrates a more detailed block diagram of anotherembodiment of a data-processing system according to the invention.

[0023]FIG. 9A shows an embodiment of an integrated-circuit device layoutaccording to the invention.

[0024]FIG. 9B depicts a second embodiment of an integrated-circuitdevice layout according to the invention.

[0025]FIG. 10A illustrates a third embodiment of an integrated-circuitdevice layout according to the invention.

[0026]FIG. 10B shows a fourth embodiment of an integrated-circuit devicelayout according to the invention.

[0027]FIG. 10C depicts a fifth embodiment of an integrated-circuitdevice layout according to the invention.

[0028]FIG. 10D illustrates a sixth embodiment of an integrated-circuitdevice layout according to the invention.

[0029]FIG. 11 shows a more detailed block diagram of anintegrated-circuit device layout according to the invention.

[0030]FIG. 12 depicts a flowchart for an exemplary algorithm thatincludes conditional-branch instructions.

[0031]FIG. 13 illustrates a zero-page memory circuitry according to theinvention for use in a 32-bit data-processing system.

[0032]FIG. 14A shows a diagram of a zero-page memory circuitry for usein a fast conditional-instruction controller (FCIC) circuitry accordingto the invention.

[0033]FIG. 14B depicts another diagram of a zero-page memory circuitryfor use in a FCIC circuitry according to the invention.

[0034]FIG. 15 illustrates an example of a timing diagram thatcorresponds to a fast conditional instruction controller (FCIC)circuitry according to the invention.

[0035]FIG. 16A shows an embodiment of a memory-cell in a zero-pagememory circuitry according to the invention.

[0036]FIG. 16B depicts another embodiment of a memory-cell in azero-page memory circuitry according to the invention. This embodimentuses a current source that a plurality of memory cells share.

DETAILED DESCRIPTION OF THE INVENTION

[0037] The invention provides a performance-scalable architecture for adata-processing apparatus. A performance-scalable architecture allows asystem designer to plan the system design and architecture to meet adesired set of system specifications. For example, aperformance-scalable architecture allows a designer to include more orless memory in a given system, depending on factors such as thecomplexity of the application, amount of data to process, and the like.The scalable architecture addresses the low-latency needs ofhigh-performance applications, yet also works in lower-performanceapplications where latency specifications may be less stringent. Oneaspect of the invention provides a solution to the problem of providinga data-processing environment with improved real-time response, i.e.,low latency. The invention provides a data-processing system that hasoptimized software, hardware, and integrated-circuit device layout.

[0038] The data-processing environment according to the inventionprovides an optimal overall system and, thus, improved data latency. Thedata-processing system includes an integrated-device with a plurality ofregions and a common circuitry. The plurality of regions preferablyinclude programmable or reconfigurable electronic circuitry, forexample, reconfigurable logic circuitry, or processor circuitry,embedded controller circuitry, etc. The plurality of regions may alsoinclude storage circuitry, for example, memory circuitry, latches, orflip-flops, as desired. The storage circuitry may be programmable orreconfigurable, as desired.

[0039] The plurality of regions may also include interconnect circuitry,preferably reconfigurable or programmable interconnect circuitry. Theinterconnect circuitry may couple the reconfigurable circuitry to othercircuitry within a region and/or other parts of the integrated-circuitdevice, as desired.

[0040] One may program or configure the circuitry within theintegrated-circuit device during initialization of the device byspecifying the type of circuitry that the device will realize, theconnections among those circuitries, and the outputs and inputs of thosecircuitries. One may also reprogram or reconfigure theintegrated-circuit device during its operation, as desired. In otherwords, by providing configuration data to the integrated-circuit deviceduring its operation, one may cause its re-programming orre-configuration in real time or nearly in real time. The configurationdata may reside on chip or may come from an external source, as desired.

[0041] The common circuitry couples to, and provides at least one signalto, the plurality of regions of the integrated-circuit device. Thecommon circuitry and the plurality of regions are positioned within theintegrated-circuit device so as to improve the latency of the at leastone signal provided to the plurality of regions of theintegrated-circuit device.

[0042] Data-processing systems according to the invention use optimizedsoftware to achieve high levels of performance. During the system designprocess, the designer plans the hardware implementation andarchitecture. The system designer then selects a particular operatingsystem. The designer may optimize the operating system and othersoftware, for example, application software, to fit the characteristicsof the system hardware and its architecture. By optimizing the softwarefor the hardware architecture, the system designer arrives at a systemdesign that provides higher performance.

[0043] Another aspect of the invention provides a data-processing systemwith improved latency when the system executes instructions that operateon data or signals residing outside, or external to, a processorcircuitry within the system. For example, the processor circuitry withinthe data-processing system may execute a conditional-branch instruction.The conditional-branch instruction typically requires the processorcircuitry to determine the state of conditional instruction data. Theconditional instruction data correspond to a signal or set of signalsthat the processor examines in order to execute a conditionalinstruction. In other words, the flow of instruction execution dependson the conditional-instruction data, and the processor circuitry makes adecision based on the conditional-instruction data.

[0044] In traditional data-processing systems, the source of theexternal signal or data uses an interrupt system to request service by aprocessor circuitry. In other words, the external source causes orinitiates an interrupt to alert the processor circuitry. The processorcircuitry responds by suspending its current operation (typically bysaving its state and pertinent data on a stack), and then executing aroutine to service the interrupt. After servicing the interrupt, theprocessor circuitry resumes its original task. The external sourcetypically must wait for a relatively long time before the processorservices the interrupt. Thus, the overall system exhibits considerablelatency.

[0045] As an alternative to interrupt-driven systems, somedata-processing systems use input registers. In those systems, a numberof registers store the state of the external variables (i.e., theconditional-instruction data). The processor examines the contents of aparticular register to execute a conditional instruction. Input-registersystems generally provide lower latency than interrupt-driven systems.But for large designs (i.e., systems that occupy a physically largesilicon area), the interconnect delay between the input signals and theprocessor circuitry may be large in comparison to the processor cycletime. In some applications, for example, real-time data-processingapplications, the large interconnect delay may cause unacceptable orundesirable latency in the system response.

[0046] To improve response latency, data-processing systems according tothe invention include fast conditional-instruction circuitry (FCIC). TheFCIC circuitry improves the system's latency by providing theconditional-instruction data, which typically reside outside theprocessor circuitry, to a zero-page memory circuitry. Theconditional-instruction data may alter the contents of one or morelocations within the zero-page memory by an asynchronous-write,synchronous-read circuitry according to the invention. The source of theconditional-instruction data writes the conditional-instruction data tothe zero-page memory asynchronously. The processor circuitry accessesthe data by synchronously reading the contents of the zero-page memory.Thus, by using the FCIC circuitry, data-processing systems according tothe invention avoid using interrupt-driven or register-input systemsthat typically suffer from poor latency.

[0047] FCIC circuitry according to the invention provide severaladvantages over conventional approaches. First, the FCIC circuitryallows a memory-mapped design approach for the system memory, thussimplifying and streamlining the design process. Second, the designerneed not use any additional decoding logic to implement thememory-mapped design. Instead, the designer may use that hardware toimplement other desired functionality. Third, unlike a fixed hardwareimplementation, the FCIC circuitry according to the invention allowschanging of the software arbitration depending on the application, thusproviding more design flexibility and versatility.

[0048] Real-Time Operating Environment with Improved Latency

[0049]FIG. 1 shows a data-processing system 1000 that includes anintegrated-circuit (IC) circuitry 1005 and at least one peripheralcircuitry 1010. The peripheral circuitries 1010 couple to the ICcircuitry 1005 via signal links 1015. The IC circuitry 1005 may includean FCIC circuitry 1035 according to the invention. The IC circuitry 1005resides within an integrated-circuit device and preferably includes oneor more processor circuitries (not shown explicitly). The processorcircuitry may operate on the data within the system and decode andexecute instructions. The peripheral circuitries 1010 may include avariety of devices or circuits, as persons skilled in the art willrecognize. For example, the peripheral circuitries 1010 may includecommunication or telecommunication circuitry, video circuitry, audiocircuitry, input circuitry, output circuitry, storage circuitry, memorycircuitry, and network circuitry, as desired.

[0050]FIG. 2 shows another example of a system 2000 that includes an ICcircuitry 1005 according to the invention. The system 2000 also includesa plurality of peripheral devices that couple to the IC circuitry 1005.The IC circuitry 1005 resides within an integrated-circuit device andpreferably includes one or more processor circuitries (not shownexplicitly). The processor circuitry may operate on the data within thesystem and decode and execute instructions. The IC circuitry 1005 mayalso include an FCIC circuitry 1035 according to the invention. Theperipheral devices include a memory circuitry 2005 (e.g., SDRAMcircuitry and associated controller), an output circuitry 2010 (e.g., aprinter), a storage circuitry 2015 (e.g., a hard drive), an inputcircuitry 2020 (e.g., a keyboard), a communication circuitry 2025 (e.g.,a modem), an audio circuitry 2030 (e.g., speakers), a video circuitry2035 (e.g., a video controller, monitor, or both), a network circuitry2040 (e.g., an Ethernet controller or network interface card), and I/Ocircuitry 2045 (e.g., game controller or joystick).

[0051] The peripheral circuitry 2005-2045 couple to the IC circuitry1005 via signal links 2050-2090, respectively. As persons skilled in theart will recognize, however, one may use other peripheral devices andcircuits, as desired. Furthermore, one may include more or fewerperipheral devices or circuits than FIG. 2 shows, as desired.

[0052]FIG. 3 shows an alternative embodiment of a system 3000 thatincludes an IC circuitry 1005 according to the invention. The ICcircuitry 1005 may include an FCIC circuitry 1035 according to theinvention. The IC circuitry 1005 resides within an integrated-circuitdevice and preferably includes one or more processor circuitries (notshown explicitly). The processor circuitry may operate on the datawithin the system and decode and execute instructions.

[0053] The system 3000 also includes a plurality of peripheralcircuitries 1010 that couple to the IC circuitry 1005 via an interfacecircuitry 1030. The peripheral circuitries 1010 may include a variety ofdevices or circuits, as persons skilled in the art will recognize. Forexample, the peripheral circuitries 1010 may include communication ortelecommunication circuitry, video circuitry, audio circuitry, inputcircuitry, output circuitry, storage circuitry, memory circuitry, andnetwork circuitry, as desired.

[0054] The peripheral circuitries 1010 couple to the interface circuitry1030 via signal links 1015. The interface circuitry 1030 couples to theIC circuitry 1005 via a signal link 1040. Using the signal links 1015and 1040, the interface circuitry 1030 communicates data signals,control signals, or both, between the IC circuitry 1005 and theperipheral circuitries 1010. The interface circuitry 1030 may alsocontrol the operation of the peripheral circuitries 1010, eitherindividually, or with the supervision of the IC circuitry 1005, asdesired.

[0055]FIG. 4 illustrates another alternative embodiment of a system 4000that includes an IC circuitry 1005 according to the invention. The ICcircuitry 1005 may include an FCIC circuitry 1035 according to theinvention. The IC circuitry 1005 resides within an integrated-circuitdevice and preferably includes one or more processor circuitries (notshown explicitly). The processor circuitry may operate on the datawithin the system and decode and execute instructions.

[0056] The system 4000 also includes a plurality of peripheralcircuitries 2005-2045 that couple to the IC circuitry 1005 via acorresponding plurality of interface circuitries 4005-4045. Theperipheral devices include a memory circuitry 2005 (e.g., SDRAMcircuitry and associated controller), an output circuitry 2010 (e.g., aprinter), a storage circuitry 2015 (e.g., a hard drive), an inputcircuitry 2020 (e.g., a keyboard), a communication circuitry 2025 (e.g.,a modem), an audio circuitry 2030 (e.g., speakers), a video circuitry2035 (e.g., a video controller, monitor, or both), a network circuitry2040 (e.g., an Ethernet controller or network interface card), and I/Ocircuitry 2045 (e.g., game controller or joystick). As persons skilledin the art will recognize, however, one may use other peripheral devicesand circuits, as desired. Furthermore, one may include more or fewerperipheral devices or circuits than FIG. 4 shows, as desired.

[0057] The peripheral circuitry 2005-2045 couple to the interfacecircuitries 4005-4045 via signal links 2050-2090, respectively. In asimilar manner, the interface circuitries 4005-4045 couple to the ICcircuitry 1005 via signal links 4050-4090, respectively. Using thesignal links 2050-2090 and 4050-4090, the interface circuitries4005-4045 communicate data signals, control signals, or both, betweenthe IC circuitry 1005 and the peripheral circuitries 2005-2045. Theinterface circuitries 4005-4045 may also control the operation of theperipheral circuitries 2005-2045, either individually, or with thesupervision of the IC circuitry 1005, as desired.

[0058]FIG. 5A depicts an embodiment 5000A of a data-processing systemaccording to the invention. The data processing system preferablyresides within a CPLD (other than external components or circuitries),although one may use other implementations, as desired, depending onsystem specifications. The embodiment 5000 includes a bus 5005 (labeled“Bus 1”) and a bus 5010 (labeled “Bus 2”). Bus 5005 and Bus 5010 mayinclude address signals, data signals, or both, as desired, depending ona particular system configuration. Bus 5005 couples to a processorcircuitry 5035 (labeled “Processor Circuitry 1”).

[0059] The processor circuitry 5035 generally constitutes a dataprocessor, for example, a microprocessor, a microcontroller, or adigital signal-processor (DSP). Depending on a particular application,one may use other type of processors, as persons skilled in the artwould understand. In a preferred embodiment, the processor circuitry5035 constitutes a DSP, particularly a multi-threaded DSP (i.e., a DSPthat can run a number of parallel processes). One may use the DSP as areal-time enhanced minimum instruction-set computer (MISC) to provide avery small core that runs at high clock rates. One may also use the DSPto execute the latency-dependent parts of the application code.

[0060] Bus 5005 also couples to CPLD ports circuitry 5015, SRAMcircuitry 5025, and FCIC circuitry 1035 (described below in moredetail). The FCIC circuitry 1035 accepts FCIC inputs 5100 (e.g.,conditional-instruction data). The CPLD ports circuitry 5015 facilitatescoupling to, and communicating with, one or more regions (e.g.,quadrants) of the CPLD device in which the circuitry of FIG. 5A resides.For example, CPLD ports circuitry 5015 may include ports configured tocouple to quadrants 2 and 4 of the CPLD (described below in more detailin connection with FIG. 11).

[0061] The SRAM circuitry 5025 provides high-speed static RAM to theprocessor circuitry 5035. If desired, bus 5005 may also couple to a DRAMcircuitry 5050 that provides additional memory for the processorcircuitry 5035. Note that one may use other types of memory circuitrythan SRAM and DRAM, as persons skilled in the art would understand.Regardless of the type of memory, the memory circuitry or circuitriesprovide storage for the processor circuitry 5035.

[0062] Bus 5010 couples to a processor circuitry 5075 (labeled“Processor Circuitry 2”). The processor circuitry 5075 generallyconstitutes a data processor, for example, a microprocessor, amicrocontroller, or a digital signal-processor (DSP). Depending on aparticular application, one may use other type of processors, as personsskilled in the art would understand. In a preferred embodiment, theprocessor circuitry 5075 constitutes a controller/processor manufacturedby Advanced RISC Machines (ARM), for example an ARM 922 processor thatprovides a number of single-cycle operations or instructions.

[0063] Bus 5010 also couples to CPLD ports circuitry 5085, SRAMcircuitry 5080, and dual-port memory circuitry 5060. The CPLD portscircuitry 5085 facilitates coupling to, and communicating with, regions(e.g., quadrants) of the CPLD device in which the circuitry of FIG. 5Aresides. For example, CPLD ports circuitry 5085 may include portsconfigured to couple to quadrants 1 and 3 of the CPLD (described belowin more detail in connection with FIG. 11). The SRAM circuitry 5080provides high-speed static RAM to the processor circuitry 5075. Ifdesired, bus 5010 may also couple to a DRAM circuitry 5070 that providesadditional memory for the processor circuitry 5075. Note that one mayuse other types of memory circuitry than SRAM and DRAM, as personsskilled in the art would understand. Regardless of the type of memory,the memory circuitry or circuitries provide storage for the processorcircuitry 5075.

[0064] The dual-port memory circuitry 5060 couples to both bus 5005 andbus 5010. The dual-port memory circuitry 5060 facilitates communication,interfacing, or data interchange between the processor circuitry 5035and the processor circuitry 5075. In other words, the processorcircuitry 5035 and the processor circuitry 5075 each couple to one portof the dual-port memory circuitry 5060.

[0065] The dual-port memory circuitry 5060 includes page registercircuitry 5065. The page register circuitry 5065 provides an arbitrationmechanism for the processor circuitry 5035 and processor circuitry 5075.The arbitration mechanism allows the processor circuitry 5035 and theprocessor circuitry 5075 to use the dual-port memory circuitry 5060without conflict. In the absence of an arbitration mechanism, theprocessor circuitry 5035 and the processor circuitry 5075 may bothattempt to access the dual-port memory circuitry 5060 simultaneously. Bydoing so, the processor circuitry 5035 and the processor circuitry 5075would potentially cause a conflict that may threaten the stability anderror-free operation of the system. Using the page register circuitry5065 helps avoid the conflict.

[0066] To further reduce the risk of conflict and contention between theprocessor circuitry 5035 and the processor circuitry 5075, one may use adual-port memory circuitry 5060 that includes two memory banks. For onebank, the processor circuitry 5035 has write-only access, whereas theprocessor circuitry 5075 has read-only access. The converseconfiguration applies to the other memory bank. In other words, for theother bank, the processor circuitry 5035 has read-only access, whereasthe processor circuitry 5075 has write-only access. One may then usesoftware routines to perform arbitration between the processor circuitry5035 and the processor circuitry 5075.

[0067] If either of the processor circuitries, say, processor circuitry5035, wishes to access the dual-port memory circuitry 5060, it sets aflag in the page register circuitry 5065. The processor circuitry 5035then owns the dual-port memory circuitry 5060, i.e., it may access thedual-port memory circuitry 5060. When it has finished using thedual-port memory circuitry 5060, the processor circuitry 5035 resets theflag in the page register circuitry 5065 and, thus, relinquishes controlof the dual-port memory circuitry 5060. Processor circuitry 5075 maythen access and use the dual-port memory circuitry 5060 by following asimilar procedure.

[0068] One may use the page register circuitry 5065 to achieve logicalor hierarchical partitioning of the CPLD according to the invention. Thelowest level of structure corresponds to the positioning within the CPLDof the common circuitry and the plurality of regions containingconfigurable or programmable electronic circuitry. Another level ofpartitioning provides logical or hierarchical partitioning. Logical orhierarchical partitioning occurs at a higher level than does thepositioning of the circuitry within the CPLD. Put another way, logicalor hierarchical partitioning takes place at a more abstract level thandoes partitioning at the physical level.

[0069] Logical or hierarchical partitioning may further reduce systemlatency by providing for the use of physically aware, functionalcompiler designs. Traditionally, system designers have allocated variousperipheral functions to the system to different areas of the systemmemory map. As a result, the addressing time of the memory depends onthe location of particular data within the memory map, for example, whenusing extended addressing mode. By providing in, or confining to, eachdifferent partition a different soft instantiation of anintellectual-property (IP) macro or function, one may page eachpartition into the same location in the system memory map, as desired.

[0070] Logical or hierarchical partitioning can provide virtualfunctionality that has performance advantages in the processing of sometypes of algorithms, for example, polynomial-based algorithms.Furthermore, using an architectural compiler to distribute the IPfunctions into the different physical regions of the CPLD allows certainadvantages, for example, a more structured and layered approach toSystem on a Programmable Chip (SOPC) architectures.

[0071] One may use the page register circuitry 5065 to implement a pagedhardware mode. By using that mode, one may switch the IP functions intothe zero-page memory-map, thus keeping access time to a minimum. Thepaged hardware mode may, for example, allow access within a single CPUcycle. One may partition the CPLD into hardware pages that one mayswitch onto the system buses, if desired, by writing different valuesinto the page register circuitry 5065 and associated registers viacontrol logic (not shown explicitly in FIG. 5A). One may achieve thatresult by using the page register circuitry 5065 and multiplexers (notshown explicitly in FIG. 5A) for various tiles or domains (furtherpartitions within each of the plurality of regions of the CPLD), up tothe level of a full region of the CPLD. In preferred embodiments, themultiplexer circuits perform the task of switching onto the bus or busesthe data that correspond to an individual page.

[0072] The hierarchical or logical partitioning works as follows.Consider a CPLD that has N regions, where each region includesprogrammable or configurable electronic circuitry and cells. The memorymap for the CPLD also has N sections, each section corresponding to oneof the N regions. One may configure the circuitry or cells within eachregion with a given functionality, for example, a processor (note thatone may configure or program the circuitry within each regiondynamically, as desired). Thus, one may potentially have N processorsrunning, one in each region of the CPLD.

[0073] The page register circuitry 5065 facilitates paging each regionof the CPLD into the CPLD or system memory map. The memory map mayconstitute the actual memory map for the processor circuitry 5035 or theprocessor circuitry 5075, as desired. The mapping provides a way for theprocessor circuitries 5035 and 5075 to communicate with the regions ofthe CPLD through the zero-page memory circuitry (not shown explicitly inFIG. 5A) of the processor circuitry 5035, the processor circuitry 5075,or both. By using zero-page memory circuitry, the processor circuitry5035, the processor circuitry 5075, or both, may access the variousregions of the CPLD in a shorter amount of time, thus improving responselatency.

[0074] During the paging of a region of the CPLD, the contents of thebus (for example, address bus, data bus, or address/data bus, asdesired) move into the zero-page memory circuitry for the processorcircuitry 5035 or the processor circuitry 5075. As an example, considera microprocessor circuitry and a buffer circuitry instantiated in, say,the second page of the page register circuitry 5065. Suppose that themicroprocessor circuitry is performing some operation with the buffercircuitry. Assume further that the processor circuitry 5035 seeks toaccess the microprocessor circuitry. One may provide access to themicroprocessor circuitry by paging appropriate data into the zero-pagememory circuitry for the processor circuitry 5035. Using the pagingapproach does not require the microprocessor to stop or suspend itsoperation with the buffer circuitry; it may instead make its dataavailable in the memory map of the processor circuitry 5035 by using thezero-page memory circuitry. The paged architecture therefore provides afast way of exchanging data between the processor circuitry 5035 or theprocessor circuitry 5075 and the various regions of the CPLD.

[0075]FIG. 5B depicts an embodiment 5000B according to the invention ofthe page register circuitry 5065 of FIG. 5A. Referring to FIG. 5B, thepage register circuitry 5065 preferably includes three parts or fields:page size 5065A, page number 5065B, and page arbitration 5065C. Each ofthe parts or fields preferably includes a register. Thus, the pageregister circuitry 5065 preferably includes a page size register, a pagenumber register, and a page arbitration register.

[0076] The page size 5065A signifies the amount of CPLD resources orcircuitry one would use for a particular page. The page size 5065A isscalable. For example, one may specify either a large number ofsmall-size circuitries, or a small number of large-size circuitries, orother combinations, as desired.

[0077] The processor circuitry 5035 or the processor circuitry 5075 usesthe page size 5065A to program or configure a CPLD configurationcontroller circuitry (not shown explicitly in FIG. 5A). The CPLDconfiguration controller circuitry includes a configuration memorycircuitry that the processor circuitries 5035 and 5075 may configuredynamically. The configuration memory circuitry provides dynamicprogramming or configuration of the CPLD. One may use the page size5065A to program the configuration memory circuitry to reserve pagesizes. The description below of FIG. 7 provides more details of the CPLDconfiguration controller circuitry and its functionality.

[0078] Referring to FIG. 5B, the page number 5065B signifies in whichaddress of the zero-page memory circuitry a given function will appear.In other words, the page number 5065B specifies pages in the memory mapof the zero-page memory circuitry.

[0079] The page arbitration 5065C signifies the priority with which eachof the hardware functions will appear in the CPLD. The page arbitration5065C provides arbitration among the pages within the page registercircuitry 5065. The page arbitration 5065C determines the order withwhich one loads into the zero-page memory circuitry the data thatcorrespond to a desired function. Preferably, one programs or configuresthe arbitration scheme or algorithm (controlled by the page registercircuitry 5065) during the initial CPLD configuration, and it remainsstatic. Note, however, that one may dynamically re-configure the CPLDand the arbitration scheme or algorithm, as desired. The dynamicconfiguration of the CPLD and the arbitration scheme or algorithm mayoccur in real-time or in near real-time.

[0080] Logical or hierarchical partitioning in systems according to theinvention provides the following advantages. First, it reduces latencybecause of the use of the zero-page memory circuitry. Second, it allowsthe paging of different concurrent functional blocks as desired. Third,it provides for the concurrent execution of each function within itscorresponding hardware page. Fourth, for identical functions, parallelconfiguration or instantiation by the controller of all of the functionsreduces the dynamic reconfiguration time. In other words, the page sizeregister maps each of the pages to the same memory address. Fifth,hierarchical or logical partitioning provides for parallel loading ofcommon initial data vectors for each identical hardware function in acomputation. Finally, using a number of soft functional logicinstantiations, one in each page of the page register circuitry, allowsimplementation of a virtual stack design. Note that the abovedescription of logical or hierarchical partitioning also applies to theembodiments shows in FIGS. 6, 7, and 8.

[0081] The system shown in FIG. 5A also includes a power-on-reset (POR)circuitry 5090. The POR circuitry 5090 provides initialization signalsto the various system components upon power up. For example, the PORcircuitry 5090 may provide a POR signal 5040 to the processor circuitry5035 and the processor circuitry 5075. In response, the processorcircuitry 5035 and the processor circuitry 5075 may reset their internalcircuits and perform power-up functions, for example, load aninitialization routine, and the like.

[0082] Note that, the POR circuitry 5090 may provide power-on-resetsignals to other system components, as desired. Depending on the systemdesign, the POR circuitry 5090 may provide more than one power-on-resetsignal to the various system components, as desired. For example, theprocessor circuitry 5035 may have a different power-on-reset requirementthan does the processor circuitry 5075. The POR circuitry 5090 mayprovide a POR signal to each of the processor circuitries 5035 and 5075that meets the power-on-reset specifications of that particularprocessor circuitry.

[0083] The system in FIG. 5A also includes a clock generator circuitry5095. The clock generator circuitry 5095 generates clock signals for thevarious systems components. For example, the clock generator circuitry5095 may provide a clock signal 5045 to the processor circuitry 5035 andthe processor circuitry 5075. Depending on the particular specificationsof a given system, the clock generator circuitry 5095 may provide clocksignals to other system components, as persons skilled in the art wouldunderstand. Moreover, depending on the requirements of the varioussystem blocks, the clock generator circuitry 5095 may provide differentclock signals to those blocks, as desired.

[0084]FIG. 6 shows an embodiment 6000 of a data-processing systemaccording to the invention. The data processing system in FIG. 6includes similar components to those in the system of FIG. 5A. Unlikethe system in FIG. 5A, however, the system shown in FIG. 6 separates thebuses into separate data and address/control buses. In other words, bus5005 in FIG. 5A corresponds to data bus 6015 (labeled “Data Bus 1”) andaddress/control bus 6005 (labeled “Address/Control Bus 1”) in FIG. 6.Likewise, bus 5010 in FIG. 5A corresponds to data bus 6020 (labeled“Data Bus 2”) and address/control bus 6010 (labeled “Address/Control Bus2”) in FIG. 6. Each of the data buses facilitates communication of datasignals among the system components. The address/control buses, on theother hand, allow the communication of address and control signals.Otherwise, the system in FIG. 6 operates similarly to the system shownin FIG. 5A.

[0085]FIG. 7 shows an embodiment 7000 that provides a more detailedarchitecture of a data-processing system according to the invention. Thearchitecture of the system in FIG. 7 generally parallels that of FIG.5A, although with more blocks to show the flexibility of the inventiveconcepts and their applicability to a variety of applications.

[0086] Similar to the system of FIG. 5A, the system in FIG. 7 includestwo buses: bus 5005 and bus 5010. The CPLD ports circuitry 5015, theSRAM circuitry 5025, the processor circuitry 5035, and the FCICcircuitry 1035 function similarly to their counterparts in the system ofFIG. 5A. Likewise, the CPLD ports circuitry 5085, the dual-port memorycircuitry 5060 (including the page register circuitry 5065), theprocessor circuitry 5075, and the SRAM circuitry 5080 perform functionssimilar to their counterparts in the system shown in FIG. 5A.

[0087] The POR circuitry 5090 provides the same general function as itscounterpart in the system of FIG. 5A, albeit with some modifications.The POR circuitry 5090 has three outputs 5040A, 5040B, and 5040C.Outputs 5040A, 5040B, and 5040C provide power-on-reset signals to theprocessor circuitry 5035, the CPLD circuitry (the blocks shown in FIG. 7or other circuit blocks, as desired), and the processor circuitry 5075,respectively. As noted above, the power-on-reset signals reset therespective circuitries to known initial states or causes them to performother initialization tasks.

[0088] The clock generator circuitry 5095 also provides the same generalfunction as its counterpart in the system of FIG. 5A, but with somemodifications. The clock generator circuitry 5095 has three outputs,5045A, 5045B, and 5045C. The outputs 5045A, 5045B, and 5045C provideclock signals to the processor circuitry 5035, the processor circuitry5075, and an external memory circuitry 7055 (described in detail below),respectively. The outputs 5045A, 5045B, and 5045C may constitutemulti-phase clock signals, as desired. The outputs 5045A, 5045B, and5045C may provide different clocks signals, depending on the particularspecifications of the processor circuitry 5035, the processor circuitry5075, and the external memory circuitry 7055, respectively, as personsskilled in the art would understand. Thus, the clock signals may havedifferent frequency, phase shift, or amplitude.

[0089] The clock generator circuitry 5095 may also supply clock signalsto other parts of the system, as desired, depending on thespecifications of a particular application. Put another way, the clockgenerator circuitry 5095 may supply clock signals to various parts ofthe system that include one or more clock domains, each with its ownparticular clocking requirements. The clock generator circuitry 5095 maytherefore accommodate a variety of system components with differingclocking specifications.

[0090] The clock generator circuitry 5095 operates in conjunction with aphase-locked loop (PLL) circuitry 7060. The PLL circuitry 7060 providesan output 7065 to the clock generator circuitry 5095. The output 7065may include one or more signals, depending on a particular application.As noted above, the clock signals (i.e., the outputs 5045A, 5045B, and5045C) may have different timing, frequency, and amplitude relationshipsto one another. The clock generator circuitry 5095 uses the output 7065to provide clock signals that have particular timing relationships toeach other. Note that the system may include more than one PLL asdesired, depending on the specifications of a particular application.For example, each PLL may provide clock signals that satisfy theclocking requirements of one or more clock domains within the system, asnoted above.

[0091] During system power-on, the PLL circuitry 7060 operates togetherwith the POR circuitry 5090 and the clock generator circuitry 5095 toprovide initial clock signals to the various system components. Duringthe power-on reset routine, the POR circuitry 5090 programs the PLLcircuitry 7060 to provide default clock signals via outputs 5045A,5045B, and 5045C (as well as any other desired clock signals). The PLLcircuitry 7060 couples to the bus 5010. Through bus 5010, the processorcircuitry 5075 may program the PLL circuitry 7060 to produce other clocksignals than the default clock signals. For example, throughprogramming, the PLL circuitry 7060 may generate clock signals withdifferent frequency, duty cycle, and the like. Note that, rather than,or in addition to, coupling to bus 5010, the PLL circuitry 7060 maycouple to bus 5005, as desired. If the PLL circuitry 7060 couples to bus5005, then the processor circuitry 5035 may program the PLL circuitry7060. If the PLL circuitry 7060 couples to both bus 5005 and bus 5010,by using a protocol (e.g., a mechanism for arbitration and handshaking)either or both of the processor circuitry 5035 and the processorcircuitry 5075 may program the PLL circuitry 7060.

[0092] The frame-buffer circuitry 7005 includes a memory circuitry thatacts as a buffer for telecommunication applications. For a givenapplication, the system's user would set up the size and theconfiguration of the memory to match the size and properties of the datapackets or frames. Thus, the frame-buffer circuitry 7005 constitutes ascalable auxiliary memory circuitry. The memory circuitry preferablyconstitutes a random-access memory (RAM), but one may use other types ofmemory, as persons skilled in the art would understand.

[0093] The CPLD configuration controller circuitry 7010 allows eitherthe processor circuitry 5035 or the processor circuitry 5075 todynamically configure the CPLD circuitry. If, during the systemoperation, either the processor circuitry 5035 or the processorcircuitry 5075 (or conceivably another system block) seeks to use aparticular circuitry, it may create that circuitry dynamically. Eitherprocessor circuitry may run a program through the CPLD configurationcontroller circuitry 7010. The program would cause the creation orinstantiation of the desired circuitry. The new circuitry would coupleto the bus and become available to other system blocks. Note that theprogram would use available circuit blocks within the CPLD, for example,programmable logic circuitry, to create or instantiate the desiredcircuitry.

[0094] The Advanced Microcontroller Bus Architecture AMBA circuitry 7020provides an interface to AMBA-compliant peripherals. AMBA is a standardbus from the Advanced RISC Machines (ARM) corporation. The system's usermay use the AMBA circuitry 7020 to couple AMBA-compliant peripherals ordevices 7022 (labeled “AMBA peripherals”) to the system. The CPLD portcircuitry 5085 provides an output or set of outputs 7015 to the AMBAcircuitry 7020 to facilitate communication between circuitry within theCPLD and the AMBA-compliant peripherals. Processor circuitry 5075 maycommunicate with the AMBA circuitry 7020 through bus 5010 and the CPLDports circuitry 5085. Similarly, processor circuitry 5035 maycommunicate with the AMBA circuitry 7020 through bus 5005, the dual-portmemory circuitry 5060, bus 5010, and the CPLD ports circuitry 5085.

[0095] Note that rather than, or in addition to, the CPLD port circuitry5085, the CPLD port circuitry 5015 may provide an output or set ofoutputs to the AMBA circuitry 7020, as desired. Note also that the AMBAstandard constitutes only one of a number of bus standards available.The user may use the circuitry 7020 to interface to other types of busesby making modifications that are within the knowledge of a person ofordinary skill in the art.

[0096] The Media-Access Control (MAC) circuitry 7025 allows the systemto communicate with a network. For example, the user may couple thesystem or the CPLD to, say, the Internet, through an Ethernet or othertype of network, by using the MAC circuitry 7025. Note that the MACcircuitry 7025 may allow the system or the CPLD to communicate with awide variety of networks, for example, Ethernet networks, token-ringnetworks, and the like, as desired.

[0097] The Universal Asynchronous Receiver-Transmitter (UART) circuitry7030 provides a mechanism for the CPLD or the system to couple to ageneral-purpose or special-purpose serial port or serial device. TheUART circuitry 7030 may support a wide variety of serial-communicationinterfaces, as desired, and depending on the specifications of aparticular application. The UART circuitry 7030 allows serialcommunication with the system or the CPLD in a variety of situations.For example, the system's user may use the UART circuitry 7030 toconnect a data terminal to the system or the CPLD. Likewise, the usermay use the UART circuitry 7030 to communicate with the system or theCPLD. As another example, the UART circuitry 7030 may facilitate serialcommunication with a computer or data-processing system through thetelephone lines.

[0098] The external flash-memory circuitry 7035 allows a user to store asystem or CPLD configuration for future use. The user may store theconfiguration and access it periodically to initialize the system, theCPLD, or both, at the time of powering up the system, or at other times,as desired. The configuration may include the programming for theprogrammable circuitry within the CPLD or other parts of the system. Theuser may modify the configuration as needed or desired. For example,after the initial configuration, the user may dynamically modify orupdate the configuration. Note that, rather than using flash memory, onemay employ other types of memory, for example, read-only memory (ROM)programmable read-only memory (PROM), electrically-erasable programmableread-only memory (EEPROM), and the like. In applications where the userdoes not need to modify the configuration after the initial systemconfiguration, the user may use a read-only type of memory, as desired.

[0099] The external memory circuitry 7055 provides the system's mainmemory. Although the CPLD includes some memory, some applications mayrequire a larger amount of memory. The external memory circuitry 7055provides the additional memory capacity for those applications. Theexternal memory circuitry 7055 may use a variety of memory circuits, forexample, dynamic random-access memory (DRAM), synchronous dynamicrandom-access memory (SDRAM), flash memory, and the like, as personsskilled in the art would understand.

[0100] The memory-controller circuitry 7045 provides a means ofcontrolling the functions of the external memory circuitry 7055. Thememory controller circuitry 7045 couples to the external memorycircuitry 7055 through an interface 7050. The interface 7050 includesone or more signal lines that allow the memory controller circuitry 7045to control the functions of the external memory circuitry 7055.Generally, the processor circuitry 5035, the processor circuitry 7075,or both, communicate with the memory controller circuitry 7045. Ratherthan, or in addition to, the processor circuitries 5035 and 5075, otherdevices within the system or the CPLD may communicate with the memorycontroller circuitry 7045, as desired.

[0101] The skew phase adjust circuitry 7040 provides more precisecontrol of the timing of the operation of the external memory circuitry7055 and other high-speed system blocks. The high-speed system blockstypically require finer adjustments to their timing for them to operateat optimum operating points. The skew phase adjust circuitry 7040provides for electronic adjustment of the timing of high-speed systemblocks. For example, the skew phase adjust circuitry 7040 may allow moreprecise phase adjustment of the clock signal or signals for the externalmemory circuitry 7055.

[0102] The system blocks described here provide the system's user a highdegree of flexibility in configuring the system's architecture andoperations. Note that, depending on a particular application, the systemmay not include all of the described blocks. Based on the a particularapplication's specifications, the system's user may include thecomponents that provide the required functionality but choose not toinclude other system blocks and components.

[0103]FIG. 8 shows an alternative embodiment 8000 that provides a moredetailed architecture of a data-processing system according to theinvention. The data processing system in FIG. 8 includes similarcomponents to those in the system of FIG. 7. Unlike the system in FIG.7, however, the system shown in FIG. 7 separates the buses into separatedata and address/control buses. In other words, bus 5005 in FIG. 7corresponds to data bus 6015 (labeled “Data Bus 1”) and address/controlbus 6005 (labeled “Address/Control Bus 1”) in FIG. 8. Likewise, bus 5010in FIG. 7 corresponds to data bus 6020 (labeled “Data Bus 2”) andaddress/control bus 6010 (labeled “Address/Control Bus 2”) in FIG. 7.Each of the data buses 6015 and 6020 facilitates the communication ofdata signals among the system components. The address/control buses 6005and 6010, on the other hand, allow the communication of address andcontrol signals. The remaining blocks in FIG. 8 operate in a mannersimilar to the blocks described in connection with FIG. 7.

[0104] FIGS. 5-8 show CPLD architectures that provide hardware forhigh-performance data-processing systems according to the invention. Asnoted above, optimal performance of data processing systems according tothe invention also depends on optimum layout of the circuitry on thesilicon substrate. Thus, another aspect of the invention relates tooptimal silicon-layout of the CPLD.

[0105] Data-processing systems according to the invention include anintegrated-circuit device with a plurality of regions and a commoncircuitry. The common circuitry couples to, and provides at least onesignal to, the plurality of regions of the integrated-circuit device.The layout of the integrated-circuit device, e.g., a CPLD, according tothe invention positions the plurality of regions and the commoncircuitry within the integrated-circuit device so as to improve thelatency of the signal or signals provided to the plurality of regions.

[0106] FIGS. 9-10 provide exemplary embodiments of the silicon-layoutaccording to the invention. Each of FIGS. 9A-9B and 10A-10D shows anintegrated-circuit device 1005 that includes one or more blocks ofcommon circuitry 9015 and one or more buses or signal distributioncircuitries 9010. The common circuitry 9015 includes electroniccircuitry, for example, processor circuitry and memory circuitry, thatoperate at high speed. The integrated-circuit device 1005 also includesa plurality of regions 9005. The regions 9005 include electroniccircuitry, for example, logic circuitry and data-processing circuitry.Preferably, the regions 9005 constitute programmable or reconfigurablecircuitry, for example, the programmable circuitry available within aCPLD or FPGA.

[0107] The common circuitry 9015 communicates signals to the regions9005 via the buses 9010. The buses 9010 may include a wide variety ofsignals, as desired. For example, the buses 9010 may include clocksignals, control signals, status signals, data signals, address signals,and the like. The buses 9010 may also route logic signals to the variousparts or regions of the integrated-circuit device 1005, as desired.

[0108] In an integrated-circuit device according to the invention, thecommon circuitry 9015 is positioned relative to the regions 9005 so asto improve the latencies of the signals that the common circuitry 9015communicates to the regions 9005. Generally, the common circuitry 9015has substantially the same physical distance to the regions 9005. By wayof illustration, the common circuitry may reside physically within thecentral area of the integrated-circuit device 1005, with the regions9005 arranged around the common circuitry 9015. As another illustration,the common circuitry 9015 may reside substantially centrally within thearea between at least two of the regions 9005.

[0109] Placing the common circuitry 9015 so that it has substantiallythe same distance from the regions 9005 helps to avoid long physicallengths of the signal lines within the buses 9010. Thus, laying out theintegrated-circuit device 1005 according to the invention improves thelatencies of the signals within the buses 9010 to each of the pluralityof the regions 9005. Moreover, laying out the integrated-circuit device1005 according to the invention tends to equalize the latency of thesignal or signals that the common circuitry 9015 provides to each of theplurality of the regions within the integrated-circuit device 1005.

[0110]FIGS. 9A and 9B illustrate integrated-circuit devices 1005 thathave a bus or buses 9010 that run only in one direction within theintegrated-circuit device 1005. For example, the bus or buses 9010 mayrun vertically or horizontally along the integrated-circuit device 1005.FIG. 9A shows an embodiment 9000A of an integrated circuit device 1005according to the invention that has a common circuitry 9015 and buses9010A and 9010B. The buses 9010A and 9010B generally run horizontallyalong the integrated-circuit device 1005. The integrated-circuit device1005 also includes regions 9005A and 9005B. The region 9005B neighborsthe bus 9010B, whereas the region 9005A neighbors the bus 9010A.

[0111]FIG. 9B shows an alternative embodiment 9000B of an integratedcircuit device 1005 according to the invention. The integrated-circuitdevice 1005 includes a common circuitry 9015, and a bus 9010C. Theintegrated-circuit circuitry 1005 also has regions 9005A and 9005B. Thecommon circuitry 9015 generally resides within the central area of theintegrated-circuit device 1005. The bus 9010C runs through the commoncircuitry 9015.

[0112] FIGS. 10A-10D illustrate integrated-circuit devices 1005 thatinclude buses 9010. The buses 9010 run in two directions within theintegrated-circuit device 1005. For example, one bus may runhorizontally, whereas a second bus may run vertically. As anotherexample, a single bus may have segments or sections that run bothhorizontally and vertically within the integrated-circuit device 1005.

[0113]FIG. 10A shows a preferred embodiment 10000A of anintegrated-circuit device 1005 according to the invention. Theintegrated-circuit device 1005 has a common circuitry 9015 and aplurality of regions 9005. The integrated-circuit device 1005 alsoincludes a bus 9010, which generally has the shape of a plus sign (“+”).The bus 9010 has a vertical segment 9010D and a horizontal segment9010E. The bus 9010 generally resides within the central area of thecommon circuitry 9015 and runs through it, i.e., overlaps at least partof it.

[0114]FIG. 10B shows another embodiment 10000B of an integrated-circuitdevice 1005 according to the invention. The integrated-circuit device1005 has a common circuitry 9015 and a plurality of regions 9005. Thecommon circuitry 9015, unlike the common circuitry 9015 of embodiment10000A in FIG. 10A, does not extend to the edges of theintegrated-circuit device 1005. The integrated-circuit device 1005 alsoincludes a bus 9010, which generally has the shape of a plus sign (“+”).The bus 9010 has a vertical segment 9010D and a horizontal segment9010E. The bus 9010 generally resides within the central area of thecommon circuitry 9015 and runs through it, i.e., overlaps at least partof it.

[0115]FIGS. 10C and 10D illustrate embodiments 10000C and 10000D,respectively, of an integrated-circuit device 1005 according to theinvention. The embodiment 10000C is similar to the embodiment 10000A inFIG. 10A, but the bus 9010 has the shape of a cross, rather than a plussign. Likewise, the embodiment 10000D is similar to the embodiment10000B in FIG. 10B, but the bus 9010 has the shape of a cross, ratherthan a plus sign.

[0116]FIG. 11 illustrates a preferred embodiment 11000 of anintegrated-circuit device 1005 according to the invention. The layout ofthe integrated-circuit device 1005 corresponds generally to that shownin FIG. 10A. Note, however, that one may lay out the integrated-circuitdevice 1005 in FIG. 11 in other ways, for example, as shown in FIGS. 9and 10, as desired.

[0117] The embodiment 11000 includes common circuitry 9015 generallypositioned in the central part of the integrated-circuit device 1005. Inthe preferred embodiment shown in FIG. 11, the common circuitry 9015includes ETM circuitry 11080, AMBA circuitry 7020, memory controllercircuitry 7045, external memory circuitry 7055, processor circuitries5035 and 5075, MAC circuitry 7025, SRAM circuitries 5025 and 5080, UARTcircuitry 7030, dual-port memory circuitries 5060, buffer circuitry11110, frame buffer circuitry 7005, FCIC circuitry 1035, and clockgenerator circuitry 5095. Note, however, that the common circuitry 9015may include more or fewer blocks of circuitry, as desired. Note alsothat the common circuitry 9015 may include different blocks of circuitrythan those shown in FIG. 11, as persons of ordinary skill in the art whohave the benefit of this description of the invention would understand.

[0118] The description of FIGS. 5-8 above includes descriptions of someof the blocks within the integrated-circuit device 1005 shown in FIG.11. Those blocks include the clock generator circuitry 5095, theprocessor circuitry 5035 (and the associated PLL circuitry 7060 and skewphase adjust circuitry 7040, which FIG. 11 does not show explicitly),the SRAM circuitry 5020, the AMBA circuitry 7020, the external-memorycircuitry 7055, the memory-controller circuitry 7045, the MAC circuitry7025, the UART circuitry 7030, the frame-buffer circuitry 7005, the SRAMcircuitry 5025, the processor circuitry 5075, the FCIC circuitry 1035,the dual-port memory circuitry 5060 (and the associated page-registercircuitry 5065, which FIG. 11 does not show explicitly). FIG. 11 showsthe dual-port memory circuitry 5060 as two blocks, one on either side ofthe bus 9010D. The two blocks constitute an electrically contiguousdual-port memory circuitry 5060, even though, they are physicallynon-contiguous (i.e., they reside on opposite sides of bus 9010D).

[0119] The integrated-circuit device 1005 includes buses 9010D and9010E. Bus 9010D has a generally vertical orientation, whereas bus 9010Ehas a generally horizontal orientation. Note that buses 9010D and 9010Eintercept generally in the central area of the integrated-circuit device1005. A buffer circuitry 11110 resides at or near the intersection ofbus 9010D and bus 9010E. The buffer circuitry 11110 acts as anamplifier, buffer, or both, for the signals traveling along buses 9010Dand 9010E. Because the buses 9010D and 9010E travel nearly from one sideof the integrated-circuit device 1005 to its opposite side, the buffercircuitry 11110 helps to prevent the signal degradation that mayotherwise occur because of long signal lines. The buffer circuitry 11110may amplify, buffer, and/or reshape signals on each of the buses 9010Dand 9010E, or segments of buses 9010D and 9010E.

[0120] As FIG. 11 shows, the integrated-circuit device 1005 includesfour regions 9005C-9005F, labeled “Region 1”through “Region 4.” Thoseregions include electronic circuitry, preferably programmable orreconfigurable circuitry of the type commonly used in PLDs, CPLDs, andFPGAs. The common circuitry within the integrated-circuit device 1005provides signals to each of the regions 9005C-9005F via buses 9010D and9010E. Because of the generally central location of the common circuitrywithin the integrated-circuit device 1005, the latencies of the signalsthat the common circuitry provides to the regions 9005C-9005F areimproved.

[0121] Note also that in the embodiment 11000, the buses 9010D and 9010Erun in a symmetrical manner with respect to the regions 9005C-9005F.Because of the layout of the buses 9010D and 9010E and given that thecommon circuitry generally resides within the central part of theintegrated-circuit device 1005, the signals that the common circuitryprovides to each of the regions 9005C-9005F tend to experience the sameamount of latency. Put another way, if one randomly distributes signalsfrom the common circuitry across the integrated-circuit device 1005, itsoverall performance may suffer because the signals may reach differentregions 9005C-9005F with different latencies. Placing the commoncircuitry and the buffer circuitry 11110 generally in the central partof the integrated circuit device 1005 tends to allow buses 9010D and9010E to distribute signals from the common circuitry with equallatencies.

[0122] Regions 9005C-9005F (labeled “Region 1” through “Region 4” inFIG. 11) may need additional storage or memory during their operation.Memory circuitries 11020A and 11020B provide additional storage to thecircuitry within regions 9005C and 9005D, respectively. Similarly,memory circuitries 11070A and 11070B provide additional storage to thecircuitry within regions 9005E and 9005F, respectively.

[0123] Dual-port memory circuitries 11030A-11030B and 11150A-11150Bfacilitate the exchange of data and other signals among the regions9005C-9005F. Dual-port memory circuitries 11030A-11030B and11150A-11150B preferably constitute SRAM memory, which can operate atrelatively high speeds. Dual-port memory circuitries 11030A and 11030Bare electrically contiguous, but physically separate (i.e., laid out innon-contiguous parts of the integrated-circuit device 1005), memorycircuits. Regions 9005C and 9005D may access dual-port memorycircuitries 11030A and 11030B simultaneously and exchange data andsignals through them. In other words, bus 9010D runs through (overlapswith) the dual-port memory circuitry that includes dual-port memorycircuitries 11030A and 11030B.

[0124] Similarly, dual-port memory circuitries 11150A and 11150Bconstitute electrically contiguous, but physically separate (i.e., laidout in non-contiguous parts of the integrated-circuit device 1005),memory circuits. Put another way, bus 9010D runs through (overlaps with)the dual-port memory circuitry that includes dual-port memorycircuitries 11050A and 11050B. Regions 9005E and 9005F may both accessdual-port memory circuitries 11150A and 11150B at the same time andexchange data and signals through them.

[0125] Note that, if desired, dual-port memory circuitries 11030A,11030B, 11150A, and 11150B may constitute an electrically contiguous,yet physically non-contiguous memory circuit. Configuring the dual-portmemory circuitries 11030A, 11030B, 11150A, and 11150B as an electricallycontiguous memory circuit offers additional flexibility by providing amechanism for the regions 9005C-9005F to communicate with one another.Configuring the dual-port memory circuitries 11030A, 11030B, 11150A, and11150B as an electrically contiguous memory circuitry, however, mayresult in longer signal propagation times. Thus, depending on thespecifications of a given application, one may organize the dual-portmemory circuitry 11030A, 11030B, 11050A, and 11050B so as to trade offflexibility and performance, as persons of ordinary skill in the artwould understand.

[0126] CPLD ports circuitries 5015 and 5085 facilitate communication andexchange of data and signals among the regions 9005C-9005F. The CPLDports circuitries 5015 and 5085 couple to the bus 9010D, bus 9010E, orboth, as desired (FIG. 11 does not explicitly show the couplingmechanisms). The CPLD ports circuitry 5015 interacts with the region9005D directly. The CPLD ports circuitry 5015, however, interacts withthe region 9005C through the multiplexer (MUX) circuitry 11040A.Similarly, the CPLD ports circuitry 5085 interacts with the region 9005Edirectly. In contrast, the CPLD ports circuitry 5085 interacts with theregion 9005F through the MUX circuitry 11040B.

[0127] The MUX circuitries 11040A and 11040B provide more flexibilityby, for example, facilitating interaction with selected blocks or partsof regions 9005C and 9005E, respectively. Note, however, that the CPLDports circuitries 5015 and 5085 may interact directly with the regions9005C-9005D and 9005E-9005F, respectively, as desired. As yet anotheralternative, the CPLD ports circuitries 5015 and 5085 may interactthrough MUX circuitries with each of the regions 9005C-9005D and9005E-9005F, respectively, as desired.

[0128] The integrated-circuit device 1005 in FIG. 11 includes circuitrythat facilitate debugging and tracing of the operation and timing of theCPLD logic circuitry. An embedded trace macro-cell (ETM) circuitry 11080monitors the state of the processor circuitry 5035, the processorcircuitry 5075, or both, as desired. The ETM circuitry 11080 receivessignals from the monitored block (e.g., the processor circuitry 5035,the processor circuitry 5075, or both) and provides tracing anddebugging information to a soft ETM circuitry 11050.

[0129] The soft ETM circuitry 11050 provides the remainder of thecircuitry for performing the trace or debug operation. The soft ETMcircuitry 11050, operating in conjunction with the ETM circuitry 11080,obviates the need for including tracing and debugging hardware withinthe monitored circuitry, thus saving cost and silicon area. The soft ETMcircuitry 11050 allows the user of the integrated-circuit device 1005 todebug the circuitry that the user plans to include within theintegrated-device 1005. The user may use a terminal or computer withappropriate tracing and debugging software to facilitate that task. Oncethe user has debugged the circuitry, the user may eliminate from theintegrated-circuit device 1005 the soft ETM circuitry 11050, the ETMcircuitry 11080, or both, to further save cost and silicon area.

[0130] A low-voltage differential signaling (LVDS) circuitry 11060provides a mechanism for the integrated-circuit device 1005 tocommunicate with other systems, devices, or circuits that contain asimilar LVDS circuitry 11060. The LVDS circuitry 11060 provides a serialinterface to facilitate communication and exchange of data or signalswith other devices. The LVDS circuitry 11060 preferably resides inrelative close proximity to the common circuitry (e.g., the processorcircuitry 5035 and the SRAM circuitry 5025) so that it may providehigh-speed signals or data from the common circuitry to other systems,devices, or circuits. Relative close proximity to the common circuitryalso allows the LVDS circuitry 11060 to supply signals from othersystem, devices, or circuits to the common circuitry with improvedlatency.

[0131] The integrated-circuit device 1005 includes a plurality ofinput/output (I/O) circuitries 11010 and dedicated I/O circuitries11030. The I/O circuitries 11010 and the dedicated I/O circuitries 11130generally reside near the peripheries of the integrated-circuit device1005. The I/O circuitries 11010 provides a means for the circuitrywithin the integrated-circuit device 1005 to communicate with off-chipcircuitry. The dedicated I/O circuitries 11130 are preferably in closeproximity to the common circuitry, which generally resides in thecentral part of the integrated-circuit device 1005.

[0132] The dedicated I/O circuitries 11130 enable the FCIC circuitry1035 to communicate with external hardware, for example, to receiveconditional-instruction data. In addition, the dedicated I/O circuitries11130 provide a way for the various blocks within the common circuitry(e.g., the processor circuitries 5035 and 5075, the external-memorycircuitry 7055, and the memory circuitry 7045) to communicate withhardware residing outside the integrated-circuit device 1005.

[0133] Note that FIGS. 9-11 illustrate conceptually the layout ofintegrated-circuit devices 1005 according to the invention. As personsskilled in the art with the benefit of this description of the inventionwill recognize, one may use the inventive concepts with equaleffectiveness in other embodiments. For example, the integrated-circuitdevices 1005 may include different numbers, shapes, and orientations ofthe buses 9010, the common circuitry 9015, and the regions 9005.Moreover, the precise positioning and layout of various blocks withinthe common circuitry 9015 and other parts of the integrated-circuitdevice 1005 lend themselves to other embodiments within the scope of theinvention. The particular layout and positioning of the various blocksdepends on the specifications of a given application, as persons skilledin the art would understand.

[0134] Note also that, based on the specifications of a givenapplication, one may make many modifications to the embodiments shown inFIGS. 5-8 and FIG. 11, as persons skilled in the art with the benefit ofthis description of the invention would understand. For example, one mayinclude other circuit blocks within the integrated-circuit device 1005(or within a data-processing system built around the integrated-circuitdevice 1005), as desired. Generally, one may include routing andswitching mechanisms that would facilitate communication betweenvirtually any of the blocks. Moreover, the various blocks within andoutside the integrated-circuit device 1005 may operate at differentspeeds or with differing electrical characteristics. One may employbuffer circuitries, interface circuitries, or both, to match the speedsand electrical characteristics of the blocks to facilitate theircommunication and interaction with each other.

[0135] Fast Conditional-Instruction Controller Circuitry

[0136] As noted above, one aspect of the invention relates to improvingsystem latency when responding to external signals or data that theprocessor circuitry uses to execute instructions. This aspect of theinvention proves advantageous in a variety of data-processingapplications, particularly when a data-processing system according tothe invention executes a software routine-designed for real-timeexecution. The real-time software routine may constitute a multimediaalgorithm, a video-processing algorithm, say, an MPEG-II algorithm, adata-communication or networking algorithm, a signal-processing routine,a control-system algorithm, and the like.

[0137] The dependence of instruction execution on external variables ordata often arises in the conditional-instruction context. FIG. 12depicts a flow-chart 12000 of a software routine. The routine includesdecision blocks 12020, 12030, 12040, and 12050. The flow of execution atdecision blocks 12020, 12030, 12040, and 12050 depends on the values ofvariables D₀, D₁, D₂, and D₃, respectively. The processor circuitrywithin the data-processing system executes conditional instructionswhose outcome depends on the values of the variables. The conditionalinstructions typically result in branching within the flow of thesoftware routine. In other words, the decision blocks examine (say, fora compare-and-branch instruction) a binary variable or signal whosevalue equals or depends on the values of the variables D₀, D₁, D₂, andD₃. Depending on the results of that examination or comparison, the flowof the software routine may change. Thus, the variables D₀, D₁, D₂, andD₃ constitute conditional-instruction data. The conditional-instructiondata may correspond to the occurrence of an event, either within the PLDor CPLD that includes data-processing circuitry according to theinvention, or outside the data-processing system that includes FCICcircuitry according to the invention. The event may constitute aphysical phenomenon or an action by a user or similar other occurrences.

[0138] The conditional-instruction data D₀, D₁, D₂, and D₃ may representvalues from modules or external hardware that perform specific tasks. Intypical data-processing systems, the conditional-instruction data mayrepresent signals from parts of the system other than the processorcircuitry, or from relatively remote parts of the system, peripherals,or other data-processing systems or circuitry.

[0139] To illustrate, consider, for example, decision block 12030. Whenthe routine reaches the block 12030, it examines or tests the value ofvariable D₁. If D₁=0, then the routine flows to Instruction 1 (block12060). On the other hand, if D₁≠0, then the routine proceeds toInstruction 2 (block 12070). The data-processing system may receive froman external source signals or data that represent one or more of theconditional-instruction data D₀, D₁, D₂, and D₃.

[0140] Real-time execution of the routine in flow chart 12000 depend ona number of factors, including the latency that the data-processingsystem exhibits in response to a change in the conditional-instructiondata D₀, D₁, D₂, and D₃. In other words, the real-time execution of theroutine depends on the system's latency when it responds to real-timechanges in conditional-instruction data originating from externalsources. As described below in more detail, the conditional-instructiondata, for example, D₀, D₁, D₂, and D₃, serve as inputs to a zero-pagememory circuitry within FCIC circuitry according to the invention.

[0141]FIG. 13 shows an embodiment 13000 of a zero-page memory 13010. Thezero-page memory 13010 constitutes a part of FCIC circuitry according tothe invention. The zero-page memory circuitry 13010 includes a memorycircuitry with a storage capacity that depends on a particularapplication. Each word in the zero-page memory circuitry may include oneor more bits, as desired. The zero-page memory circuitry 13010 in theembodiment 13000 includes 32-bit words, although one may use other wordwidths, as persons skilled in the art would understand.

[0142] The zero-page memory circuitry 13010 typically occupies thelower-end of the address map for a processor circuitry. Consider, forexample, a 32-bit processor circuitry. The processor circuitry canpotentially address 2³² memory locations. If the memory architecture ofthe system is a byte wide, then the system has a 4 giga-byte (4 GB)memory address space. The zero-page memory circuitry 13010 resides inthe lower part of that address space. The number of memory locationsdedicated to the zero-page memory circuitry 13010 varies depending onthe specifications of a given system, as persons skilled in the artwould understand.

[0143] The processor circuitry can access the zero-page memory circuitry13010 with high-speed. Two factors make possible the high-speed accessof the zero-page memory circuitry 13010 by the processor circuitry.First, the zero-page memory circuitry 13010 typically constituteshigh-speed memory. In FCIC circuitry according to the invention, thezero-page memory circuitry 13010 preferably constitutes SRAM memorycircuitry. Thus, the processor circuitry may access the contents of thezero-page memory circuitry 13010 in a relatively short amount of time.Second, because the zero-page memory occupies the low-end of the systemmemory map, the processor may access it without adding any offset to thememory address.

[0144] Referring to FIG. 13, a plurality of signal lines 13030 couple tomemory cells 13020 within the zero-page memory circuitry 13010. Morespecifically, signal line 13030A couples to memory cell 13020A, signalline 13030B couples to memory cell 13020B, and so on. The signal lines13030 also couple to the sources of the conditional-instruction data. Asnoted above, the sources of the conditional-instruction data may resideoutside the data-processing system or outside a PLD or CPLD according tothe invention that performs data-processing tasks. For improvedperformance, the data-processing system should receive theconditional-instruction data through high-speed, low-latency signalpaths, for example, through the dedicated I/O circuitry 11130 shown inFIG. 11.

[0145]FIG. 13 denotes the conditional-instruction data as D₀ through D₃.The signal line 13030A couples to the source of conditional-instructiondatum D₀, the signal line 13030B couples to the source ofconditional-instruction datum D₁, and so on. The 3 signal lines 13030provide the conditional-instruction data D₀-D₃₁ to the zero-page memorycircuitry 13010 in an asynchronous manner. Put another way, the signallines 13030 write the conditional-instruction data D₀-D₃₁ to thezero-page memory circuitry 13010 without synchronizing to a system clocksignal or to a processor-circuitry clock signal. The asynchronouswriting of the conditional-instruction data into the zero-page memorycircuitry 13010 helps in part to improve the overall system latency.

[0146] The processor circuitry reads the contents of the zero-pagememory circuitry 13010 in a synchronous manner. In other words, theprocessor circuitry uses a clock signal to access and read the zero-pagememory circuitry 13010. Recall from the description above that theprocessor circuitry uses the conditional-instruction data to executeconditional instructions. In FCIC circuitry according to the invention,the processor circuitry may access the conditional-instruction datathrough the zero-page memory circuitry 13010, thus overcoming therelatively long delays associated with interrupt-driven orregister-input data-processing systems. Thus, data-processing systemsthat include FCIC circuitry according to the invention exhibit lowlatency when executing routines that require fast response, such asreal-time data-processing routines.

[0147]FIG. 14A shows an architectural block diagram of the zero-pagememory circuitry 13010. The zero-page memory circuitry 13010 preferablyresides within an SRAM circuitry 14010. The SRAM circuitry 14010includes high-speed memory circuitry. The zero-page memory circuitry13010 accepts conditional-instruction data via the signal lines 13030.The zero-page circuitry 13010 uses memory cells that are similar toconventional SRAM memory cells. Unlike a typical conventional SRAMmemory cell, however, memory cells within the zero-page circuitry 13010accept conditional-instruction data asynchronously.

[0148] The SRAM circuitry 14010 couples to the bus 14040. The bus 14040may include a variety of signal lines, for example, address, data,status, and control signal lines. The bus 14040 may be the same orsimilar to the buses 5005 and 5010 in FIGS. 5 and 7, as desired. Theprocessor circuitry (not shown explicitly in FIG. 14A) may communicatewith the zero-page memory circuitry 13010 and the SRAM circuitry 14010via the bus 14040. The processor circuitry may constitute the processorcircuitry 5035 or the processor circuitry 5075 of FIGS. 5-8 and 11, asdesired.

[0149]FIG. 14B shows a variation of the circuitry shown in FIG. 14A. InFIG. 14B, the bus circuitry includes two buses: an address/control bus14050 and a data bus 14060. The circuitry shown in FIG. 14B may besuitable for use in the systems shown in FIGS. 6 and 8. Recall that, inthe systems of FIGS. 6 and 8, the bus circuitry includes two separatebus structures that includes address/control buses 6005 and 6015, anddata buses 6010 and 6020. The bus 14050 may be similar or the same asthe address/control buses 6005 or 6015. Similarly, the bus 14060 may besimilar or the same as the data buses 6010 and 6020. The processorcircuitry (not shown explicitly in FIG. 14B) may communicate with thezero-page memory circuitry 13010 and the SRAM circuitry 14010 via thebuses 14050 and 14060. The processor circuitry may constitute theprocessor circuitry 5035 or the processor circuitry 5075 of FIGS. 5-8and 11, as desired.

[0150]FIG. 15 shows timing diagrams 15000 corresponding to the operationof FCIC circuitry according to the invention. The timing diagrams 15000include a waveform 15010 of a clock signal. The clock signal 15010 mayconstitute the system's clock or the clock that provides timing signalsfor the processor circuitry. The clock signal has a period denoted asT_(clk). The clock signal also has a setup time shown as t_(su). Thetiming diagram 15000 also includes a waveform 15020 of aconditional-instruction data signal. The conditional-instruction datasignal may, for example, correspond to one of the variables D₀, D₁, D₂,or D₃ in FIG. 12.

[0151] A third waveform 15030 in FIG. 15 illustrates the data signalthat the processor circuitry uses to execute the conditionalinstruction. The conditional instruction may constitute a conditionalcompare, branch, test, or other instruction. The processor obtains thedata signal by accessing the zero-page memory circuitry (either directlyor through a memory controller or other type of circuitry) and reading aconditional-instruction datum.

[0152] Within a time period shown as t₀ in FIG. 15, the processorcircuitry executes the instruction and provides the results. Theprocessor circuitry may provide the result to a port, a register, amemory location, and the like, as desired. Execution of the softwareroutine then commences with the next instruction in the flow diagram ofthe software routine or algorithm. The timing diagram 15000 includes afourth waveform 15040. The waveform 15040 corresponds to a result of theexecution by the processor circuitry of the conditional instruction.Note that the results of the conditional instruction, for example, aconditional branch instruction, depend on the conditional-instructiondatum. Note also that, with the signal timing shown in FIG. 13, thedata-processing system according to the invention responds to a changein the external data or signal within t₀+T_(clk).

[0153] On average, a data-processing system that uses FCIC circuitryaccording to the invention exhibits a latency of half a clock cycle forreading the value of the conditional-instruction datum or data. In otherwords, assuming a normal distribution in the changes in theconditional-instruction datum, over a relatively large number of readoperations, the system will on average read and respond to a change inthe datum within a half clock cycle, or T_(clk)/2. On the other hand, ifthe conditional-instruction datum changes just after the positive edgeof the clock signal, the system does not use the new value until thenext clock cycle. In the later case, the system responds to the changein the datum in one clock cycle, or T_(clk). Thus, the system responselatency, t_(r), has the following lower and upper bounds:${\frac{1}{2}T_{clk}} \leq t_{r} \leq {T_{clk}.}$

[0154] In contrast, in an interrupt-driven data-processing system, thelatency may be on the order of 20-30 clock cycles—or even higher—eachtime the system examines the values of a conditional-instruction datumor other comparable external variable. When one considers that a mainsoftware routine may invoke the routine shown in FIG. 12 numerous times,the potential impact on the overall latency of the data-processingsystem may become too severe for satisfactory performance. Theinterrupt-driven system may therefore exhibit too much latency torespond in real time to changes in the variables or data signals. As aresult, the overall system performance may suffer or fail to meet itsspecifications and requirements.

[0155]FIG. 16A shows an embodiment 16000A of a memory cell for use inzero-page circuitry according to the invention. The memory cell providesfor asynchronous writing of conditional-instruction data to the cell.The memory cell also features synchronous reading of theconditional-instruction data from the cell. Thus, the memory cellconstitutes asynchronous-write, synchronous-read (AWSR) circuitry.

[0156] Similar to a conventional static memory (SRAM) cell, the memorycell in FIG. 16A uses transistors T₁, T₂, T₃, and T₄. Transistors T₁ andT₂ are p-channel devices, whereas transistors T₃ and T₄ are n-channeldevices. The memory cell also includes n-channel access transistors T₆and T₇. By providing appropriate control signals to gate terminals16110A and 16110B, respectively, of transistors T₆ and T₇, one may reada value from the memory cell or write a value into the memory cell. Thecontrol signals constitute row and column decode signals (labeled as“R/C Decode” in FIG. 16).

[0157] The normal operation of the cell (i.e., operation as a regularmemory cell) parallels the operation of a conventional SRAM cell.Suppose, for example, that a data source seeks to write a logic datum,say, binary logic “0” (a voltage level typically close to the groundpotential or V_(SS)), into the cell. To do so, the data source placesthe complement of the logic datum, or a binary “1” (i.e., the complementof the logic value that the data source seeks to write into the cell),on write line 16090. The RIC Decode line 16110B then activates andcouples the gate of transistor T₃ to the logic “1” level. Transistor T₃begins to turn on and its drain voltage starts to drop. The droppingvoltage on the drain of transistor T₃ pulls the gate of transistor T₄towards logic “0” (a voltage level typically close to the ground voltageor VSS). Transistor T₄ thus starts to turn off and its drain voltagebegins to rise. Because of positive feedback within the circuit, therising drain voltage of transistor T₄ in turn causes transistor T₃ toturn on harder. The positive feedback eventually causes T₃ to turn fullyon and T₄ to turn fully off. Thus, the drain voltage of T₃ falls tonearly the ground potential, or V_(SS). In other words, the cell holdsor stores the desired logic “0” value.

[0158] One may write a logic “1” value into the cell in a similarmanner. The data source places the complement of the desired logicdatum, or a binary “0” (i.e., the complement of the logic value that thedata source seeks to write into the cell), on write line 16090. The R/CDecode line 16110B then activates and couples the gate of transistor T₃to the logic “0” level. Transistor T₃ begins to turn off and its drainvoltage starts to rise. The rising voltage on the drain of transistor T₃pulls the gate of transistor T₄ towards logic “1.” Transistor T₄ thusstarts to turn on and its drain voltage begins to fall. The fallingdrain voltage of transistor T₄ in turn causes transistor T₃ to turn offeven more. The positive feedback eventually causes T₃ to turn fully offand T₄ to turn fully on. Thus, the drain voltage of T₃ rises to nearlythe supply voltage, or V_(DD). In other words, the cell holds or storesthe desired logic “1” value.

[0159] To retrieve the value stored within the cell, one uses the readoperation. The read operation commences by R/C Decode line 16110Aactivating (going to a logic “1” level), thus tuning on transistor T₄.Transistor T₆ operates as a pass transistor. With transistor T₆ in theon state, the voltage level at the drain of transistor T₆, whichrepresents the binary-logic value stored in the cell, appears on readline 16100. A buffer or read amplifier circuitry (not shown explicitlyin FIG. 16) may then sample or read the stored datum from line 16100 andprovide it to other circuitry, as desired.

[0160] Note that the normal write operation, e.g., the activation of theR/C Decode line 16110B, is typically synchronous. In other words, thetiming of the normal write operation occurs in synchronization with aclock signal. Note also that, similar to the normal write operation, theread operation, e.g., the activation of the R/C Decode line 16110A, istypically synchronous. Put another way, the timing of the read operationoccurs in synchronization with a clock signal.

[0161] The memory cell in FIG. 16A also includes an n-channeltransistor, T₅, in parallel with transistor T₄. Transistor T₅facilitates the asynchronous writing of a conditional-instruction datum,say, D_(n), into the memory cell. In contrast to the normal writeoperation described above, the asynchronous write operation does notsynchronize to a clock signal. The asynchronous write operation occursas follows: Suppose that D_(n) has a binary value of “0.” A binary “0”level on D_(n) asserts a binary “0” level at the gate terminal 16080 oftransistor T₅ and turns it off. Transistor T₅ therefore does not conductany current (except for negligible leakage and parasitic currents).Thus, transistor T₅ does not affect the operation of the memory cell(i.e., the normal write and read operations).

[0162] Suppose that D_(n) rises to a logic “1” level. A logic “1” levelon the gate terminal 16080 of transistor T₅ turns it on and causes itsdrain voltage and the gate voltage of transistor T₃ to drop. TransistorT₃ begins to turn off and its drain voltage starts to rise. The risingvoltage on the drain of transistor T₃ pulls the gate of transistor T₄towards logic “1.” Transistor T₄ thus starts to turn on and its drainvoltage begins to fall. The falling drain voltage of transistor T₄ inturn causes transistor T₃ to turn off even more. The positive feedbackeventually causes T₃ to turn fully off and T₄ to turn fully on. Thus,the drain voltage of T₃ rises to nearly the supply voltage, or V_(DD),and the cell holds the desired D_(n) value of binary “1.”

[0163] Subsequently, the processor circuitry or other circuitry thatuses the value of conditional-instruction datum D_(n) reads the memorycell as part of instruction execution and, generally, as part of thedata-processing algorithm. The memory cell provides a D_(n) value ofbinary “1,” and the processor circuitry executes the conditionalinstruction based on that value. The processor circuitry may also resetthe memory cell (by using, for example, the normal write cycle), thusmaking-it ready to accept a subsequent value of D_(n). Thus, rather thanusing an interrupt system with a relatively long latency, a dataprocessing system may use FCIC circuitry according to the invention toreduce the latency within the system.

[0164]FIG. 16B illustrates an embodiment 16000B of another memory cellfor use in FCIC circuitry according to the invention. The memory cell inFIG. 16B functions similarly to the memory cell 16A. In the memory cellof FIG. 16B, however, the gate terminals of transistors T₁ and T₂ coupleto a current-mirror circuitry 16140. The current-mirror circuitry 16140includes a transistor T₈ and a current source I₀. The current-mirrorcircuitry 16140 operates as a conventional current mirror, known topersons of ordinary skill in the art. A current substantially equal toI₀ flows through either T₁ or T₂, depending, respectively, on whether T₃or T₄ are in the on state. Note that several memory cells may use thesame current-mirror circuitry 16140. That sharing of circuitrysimplifies the layout of the memory cells and allows more dense siliconintegration. The current-mirror circuitry 16140, however, does not alterthe overall function that the memory cell performs.

[0165] Note that the zero-page memory circuitry according to theinvention preferably uses static-type memory circuitry. One may useother types of memory circuitry, for example, RAM, SDRAM, flash memory,and the like, as desired. The choice of memory technology depends on anumber of factors, such as system performance, cost, etc., as personsskilled in the art would understand. Any modifications to theembodiments of FIGS. 16A and 16B will also be within the knowledge ofpersons skilled in the art who have the benefit of this description ofthe invention. Note also that one may use active-low logic circuitry bymaking modifications to the circuitry shown and described here. Thosemodifications will also fall within the knowledge of a person ofordinary skill in the art.

[0166] The embodiments described here and shown in the accompanyingdiagram mainly address the application of the inventive concepts toreal-time data-processing systems. Note, however, that one mayeffectively apply the inventive concepts to other data-processingsystems and applications, as desired, for example, in data-processingsystems whose specifications do not include real-time data processing.The inventive concepts described here potentially improve the latency ofa wide variety of data-processing systems, particularly high-performancesystems.

[0167] Further modifications and alternative embodiments of thisinvention will be apparent to persons skilled in the art in view of thisdescription of the invention. Accordingly, this description teachesthose skilled in the art the manner of carrying out the invention andare to be construed as illustrative only. The forms of the inventionshown and described should be taken as the presently preferredembodiments.

[0168] Persons skilled in the art may make various changes in the shape,size and arrangement of parts without departing from the scope of theinvention described in this document. For example, persons skilled inthe art may substitute equivalent elements for the elements illustratedand described here. Moreover, persons skilled in the art after havingthe benefit of this description of the invention may use certainfeatures of the invention independently of the use of other features,without departing from the scope of the invention.

I claim:
 1. A configurable integrated-circuit device, comprising: aplurality of regions that each include configurable electroniccircuitry; and common circuitry that provides at least one signal to atleast two regions of the plurality of regions, wherein the commoncircuitry and the at least two regions are positioned within theconfigurable integrated-circuit device so as to improve the latencies ofthe at least one signal to each of the at least two regions.
 2. Theconfigurable integrated-circuit device of claim 1, in which the commoncircuitry comprises at least one processor circuitry.
 3. Theconfigurable integrated-circuit device of claim 2, in which the commoncircuitry further comprises at least one memory circuitry thatcommunicates with the at least one processor circuitry.
 4. Theconfigurable integrated-circuit device of claim 3, further comprising atleast one bus circuitry that couples the common circuitry to theplurality of regions within the configurable integrated-circuit device.5. The configurable integrated-circuit device of claim 4, in which thecommon circuitry further comprises a clock generator circuitry thatprovides at least one clock signal.
 6. The configurableintegrated-circuit device of claim 5, further comprising at least oneport circuitry that facilitates communication between the at least tworegions.
 7. The configurable integrated-circuit device of claim 6,further comprising at least one dual-port memory circuitry coupled tothe at least two regions.
 8. The configurable integrated-circuit deviceof claim 7, further comprising input/output (IO) circuitry thatfacilitates communication between the common circuitry and circuitryexternal to the configurable integrated-circuit device.
 9. Theconfigurable integrated-circuit device of claim 8, in which the commoncircuitry is positioned at substantially the same distance from eachregion of the plurality of regions.
 10. The configurableintegrated-circuit device of claim 8, in which the common circuitry ispositioned at substantially the same distance from each of the at leasttwo regions of the plurality of regions.
 11. A data-processing systemincluding a configurable integrated-circuit device according to claim 1,the data-processing system further comprising at least one peripheralcircuitry coupled to the configurable integrated-circuit device.
 12. Adata-processing system including a configurable integrated-circuitdevice according to claim 2, the data-processing system furthercomprising at least one peripheral circuitry coupled to the configurableintegrated-circuit device.
 13. A data-processing system including aconfigurable integrated-circuit device according to claim 3, thedata-processing system further comprising at least one peripheralcircuitry coupled to the configurable integrated-circuit device.
 14. Adata-processing system including a configurable integrated-circuitdevice according to claim 4, the data-processing system furthercomprising at least one peripheral circuitry coupled to the configurableintegrated-circuit device.
 15. A data-processing system including aconfigurable integrated-circuit device according to claim 5, thedata-processing system further comprising at least one peripheralcircuitry coupled to the configurable integrated-circuit device.
 16. Aconfigurable integrated-circuit device, comprising: a plurality ofregions that each include configurable electronic circuitry; and commoncircuitry that provides at least one signal to at least two regions ofthe plurality of regions, wherein the common circuitry and the at leasttwo regions are positioned within the configurable integrated-circuitdevice so that the latencies of the at least one signal to each of theat least two regions tend to be equalized.
 17. The configurableintegrated-circuit device of claim 16, in which the common circuitrycomprises at least one processor circuitry.
 18. The configurableintegrated-circuit device of claim 17, in which the common circuitryfurther comprises at least one memory circuitry that communicates withthe at least one processor circuitry.
 19. The configurableintegrated-circuit device of claim 18, further comprising at least onebus circuitry that couples the common circuitry to the plurality ofregions within the configurable integrated-circuit device.
 20. Theconfigurable integrated-circuit device of claim 19, in which the commoncircuitry further comprises a clock generator circuitry that provides atleast one clock signal.
 21. The configurable integrated-circuit deviceof claim 20, further comprising at least one port circuitry thatfacilitates communication between the at least two regions.
 22. Theconfigurable integrated-circuit device of claim 21, further comprisingat least one dual-port memory circuitry coupled to the at least tworegions.
 23. The configurable integrated-circuit device of claim 22,further comprising input/output (IO) circuitry that facilitatescommunication between the common circuitry and circuitry external to theconfigurable integrated-circuit device.
 24. The configurableintegrated-circuit device of claim 23, in which the common circuitry ispositioned centrally with respect to the plurality of regions.
 25. Theconfigurable integrated-circuit device of claim 23, in which the commoncircuitry is positioned centrally with respect to the at least tworegions of the plurality of regions.
 26. A data-processing systemincluding a configurable integrated-circuit device according to claim16, the data-processing system further comprising at least oneperipheral circuitry coupled to the configurable integrated-circuitdevice.
 27. A data-processing system including a configurableintegrated-circuit device according to claim 17, the data-processingsystem further comprising at least one peripheral circuitry coupled tothe configurable integrated-circuit device.
 28. A data-processing systemincluding a configurable integrated-circuit device according to claim18, the data-processing system further comprising at least oneperipheral circuitry coupled to the configurable integrated-circuitdevice.
 29. A data-processing system including a configurableintegrated-circuit device according to claim 19, the data-processingsystem further comprising at least one peripheral circuitry coupled tothe configurable integrated-circuit device.
 30. A data-processing systemincluding a configurable integrated-circuit device according to claim20, the data-processing system further comprising at least oneperipheral circuitry coupled to the configurable integrated-circuitdevice.
 31. A programmable logic device, comprising: a plurality ofregions that each include configurable electronic circuitry; buscircuitry coupled to the plurality of regions; and common circuitrycoupled to the bus circuitry, wherein the common circuitry provides asignal to the plurality of regions through the bus circuitry, andwherein the common circuitry and the plurality of regions are positionedwithin the programmable logic device so as to improve the latencies ofthe signal to each region.
 32. The programmable logic-device of claim31, in which the common circuitry comprises at least one processorcircuitry.
 33. The programmable logic-device of claim 32, in which thecommon circuitry further comprises at least one memory circuitryconfigured to communicate with the at least one processor circuitry. 34.The programmable logic-device of claim 33, further comprisingconfigurable interconnect circuitry coupled to each region.
 35. Theprogrammable logic-device of claim 34, further comprising input-output(I/O) circuitry coupled to the common circuitry.
 36. The programmablelogic-device of claim 35, in which the input-output circuitry is furthercoupled to the plurality of regions.
 37. The programmable logic-deviceof claim 36, further comprising a memory controller circuitry adapted tocommunicate with memory circuitry external to the programmablelogic-device.
 38. The programmable logic-device of claim 37, in whichthe common circuitry further comprises a dual-port memory circuitryadapted to facilitate communication between a plurality of processorcircuitries within the common circuitry.
 39. The programmablelogic-device of claim 38, which includes four regions, each regionpositioned in a quadrant of the programmable logic-device.
 40. Theprogrammable logic-device of claim 39, in which the bus circuitryincludes a horizontal bus circuitry and a vertical bus circuitry thatintersects the horizontal bus circuitry, wherein the intersection of thehorizontal bus circuitry and the vertical bus circuitry overlaps atleast in part with the common circuitry.
 41. A data-processing systemincluding a programmable logic-device according to claim 31, thedata-processing system further comprising at least one peripheralcircuitry coupled to the programmable logic-device.
 42. Adata-processing system including a programmable logic-device accordingto claim 32, the data-processing system further comprising at least oneperipheral circuitry coupled to the programmable logic-device.
 43. Adata-processing system including a programmable logic-device accordingto claim 33, the data-processing system further comprising at least oneperipheral circuitry coupled to the programmable logic-device.
 44. Adata-processing system including a programmable logic-device accordingto claim 34, the data-processing system further comprising at least oneperipheral circuitry coupled to the programmable logic-device.
 45. Adata-processing system including a programmable logic-device accordingto claim 35, the data-processing system further comprising at least oneperipheral circuitry coupled to the programmable logic-device.
 46. Adata-processing system including a programmable logic-device accordingto claim 36, the data-processing system further comprising at least oneperipheral circuitry coupled to the programmable logic-device.
 47. Amethod of improving latency in a configurable integrated-circuit device,comprising: providing the configurable integrated-circuit device;partitioning the configurable integrated-circuit device into a pluralityof regions that each include configurable electronic circuitry;including within configurable the integrated-circuit device a commoncircuitry that provides at least one signal to at least two regions ofthe plurality of regions; and positioning the common circuitry and theat least two regions within the configurable integrated-circuit deviceso as to improve the latencies of the at least one signal to each of theat least two regions.
 48. The method of claim 47, which furthercomprises including in the common circuitry at least one processorcircuitry.
 49. The method of claim 48, which further comprises includingin the common circuitry at least one memory circuitry configured tocommunicate with the at least one processor circuitry.
 50. The method ofclaim 49, which further comprises including in the configurableintegrated-circuit device at least one bus circuitry adapted to couplethe common circuitry to the plurality of regions of theintegrated-circuit device.
 51. The method of claim 50, which furthercomprises including in the common circuitry a clock generator circuitryadapted to provide at least one clock signal.
 52. The method of claim51, which further comprises including in the integrated-circuit deviceat least one port circuitry adapted to facilitate communication betweenthe at least two regions.
 53. The method of claim 52, which furthercomprises including in the integrated-circuit device at least onedual-port memory circuitry coupled to the at least two regions.
 54. Themethod of claim 53, which further comprises including in theintegrated-circuit device at least one dedicated input/output (IO)circuitry adapted to facilitate communication between the commoncircuitry and circuitry external to the integrated-circuit device. 55.The method of claim 54, which further comprises positioning the commoncircuitry at substantially the same distance from each region of theplurality of regions.
 56. The method of claim 54, which furthercomprises positioning the common circuitry at substantially the samedistance from each of the at least two regions of the plurality ofregions.
 57. A method of improving latency in a configurableintegrated-circuit device, comprising: providing the configurableintegrated-circuit device; partitioning the configurableintegrated-circuit device into a plurality of regions that each includeconfigurable electronic circuitry; including within the configurableintegrated-circuit device a common circuitry that provides at least onesignal to at least two regions of the plurality of regions; andpositioning the common circuitry and the at least two regions within theconfigurable integrated-circuit device so that the latencies of the atleast one signal to each of the at least two regions tend to beequalized.
 58. The method of claim 57, which further comprises includingin the common circuitry at least one processor circuitry.
 59. The methodof claim 58, which further comprises including in the common circuitryat least one memory circuitry configured to communicate with the atleast one processor circuitry.
 60. The method of claim 59, which furthercomprises including in the configurable integrated-circuit device atleast one bus circuitry adapted to couple the common circuitry to theplurality of regions of the integrated-circuit device.
 61. The method ofclaim 60, which further comprises including in the common circuitry aclock generator circuitry adapted to provide at least one clock signal.62. The method of claim 61, which further comprises including in theintegrated-circuit device at least one port circuitry adapted tofacilitate communication between the at least two regions.
 63. Themethod of claim 62, which further comprises including in theintegrated-circuit device at least one dual-port memory circuitrycoupled to the at least two regions.
 64. The method of claim 63, whichfurther comprises including in the integrated-circuit device at leastone dedicated input/output (IO) circuitry adapted to facilitatecommunication between the common circuitry and circuitry external to theintegrated-circuit device.
 65. The method of claim 64, which furthercomprises positioning the common circuitry at substantially the samedistance from each region of the plurality of regions.
 66. The method ofclaim 64, which further comprises positioning the common circuitry atsubstantially the same distance from each of the at least two regions ofthe plurality of regions.